10 research outputs found

    Privacy Amplification by Decentralization

    Get PDF
    Analyzing data owned by several parties while achieving a good trade-off between utility and privacy is a key challenge in federated learning and analytics. In this work, we introduce a novel relaxation of local differential privacy (LDP) that naturally arises in fully decentralized algorithms, i.e., when participants exchange information by communicating along the edges of a network graph without central coordinator. This relaxation, that we call network DP, captures the fact that users have only a local view of the system. To show the relevance of network DP, we study a decentralized model of computation where a token performs a walk on the network graph and is updated sequentially by the party who receives it. For tasks such as real summation, histogram computation and optimization with gradient descent, we propose simple algorithms on ring and complete topologies. We prove that the privacy-utility trade-offs of our algorithms under network DP significantly improve upon what is achievable under LDP (sometimes even matching the utility of the trusted curator model), showing for the first time that formal privacy gains can be obtained from full decentralization. Our experiments illustrate the improved utility of our approach for decentralized training with stochastic gradient descent

    La confidentialité différentielle : quelle quantification de la privacy dans le monde de l’apprentissage automatique ?

    No full text
    L’effondrement des prix de stockage de l’information, la couverture croissante des usages informatiques et des collectes de données qui y sont associées ainsi que l’accroissement des capacités de traitement de l’information sont autant de bouleversements techniques dans le domaine de l’information. Que l’on parle de Big Data, ou que l’on considère simplement les conséquences de la numérisation lors de la crise sanitaire ces deux dernières années, la collecte généralisée de données sensibles est un nouvel enjeu de notre société. À titre d’exemple, un téléphone récolte le généralement la position instantanée, les relations, les heures de sommeil, les questions et autres données de santé de son utilisateur. La nécessité de sécuriser et d’éviter les fuites de données, qu’elles soient malicieuses ou non, est donc un enjeu clé de la transition numérique. Mais comment peut-on garantir la privacy ? Ce concept a de nombreuses facettes : offuscation, droit à l’oubli, anonymat, confidentialité, minimisation des données. Dans le cadre de l’apprentissage automatique (Machine learning), une métrique s’est imposée au sein de la recherche et des applications des GAFAM pour quantifier le niveau de privacy d’un procédé donné. La confidentialité différentielle (differential privacy) est en effet une définition mathématique qui réduit à un nombre réel le niveau de persistance d’une donnée dans les sorties d’un algorithme. Ce mémoire décrit l’émergence et les facteurs qui ont contribué au succès de cette quantification, ainsi que les conséquences implicites de cette définition sur les attentes de l’apprentissage automatique et le rapport entre l’individu et ses données. Nous abordons donc l’évolution de la notion de privacy face aux nouvelles réalités techniques, nous mettons en contexte la définition de confidentialité différentielle comme une technique de quantification et nous analysons ses variantes comme limites de la définition originelle

    Privacy Amplification by Decentralization

    No full text
    International audienceAnalyzing data owned by several parties while achieving a good trade-off between utility and privacy is a key challenge in federated learning and analytics. In this work, we introduce a novel relaxation of local differential privacy (LDP) that naturally arises in fully decentralized algorithms, i.e., when participants exchange information by communicating along the edges of a network graph without central coordinator. This relaxation, that we call network DP, captures the fact that users have only a local view of the system. To show the relevance of network DP, we study a decentralized model of computation where a token performs a walk on the network graph and is updated sequentially by the party who receives it. For tasks such as real summation, histogram computation and optimization with gradient descent, we propose simple algorithms on ring and complete topologies. We prove that the privacy-utility trade-offs of our algorithms under network DP significantly improve upon what is achievable under LDP, and often match the utility of the trusted curator model. Our results show for the first time that formal privacy gains can be obtained from full decentralization. We also provide experiments to illustrate the improved utility of our approach for decentralized training with stochastic gradient descent

    Privacy Amplification by Decentralization

    Get PDF
    Analyzing data owned by several parties while achieving a good trade-off between utility and privacy is a key challenge in federated learning and analytics. In this work, we introduce a novel relaxation of local differential privacy (LDP) that naturally arises in fully decentralized algorithms, i.e., when participants exchange information by communicating along the edges of a network graph without central coordinator. This relaxation, that we call network DP, captures the fact that users have only a local view of the system. To show the relevance of network DP, we study a decentralized model of computation where a token performs a walk on the network graph and is updated sequentially by the party who receives it. For tasks such as real summation, histogram computation and optimization with gradient descent, we propose simple algorithms on ring and complete topologies. We prove that the privacy-utility trade-offs of our algorithms under network DP significantly improve upon what is achievable under LDP (sometimes even matching the utility of the trusted curator model), showing for the first time that formal privacy gains can be obtained from full decentralization. Our experiments illustrate the improved utility of our approach for decentralized training with stochastic gradient descent

    From Noisy Fixed-Point Iterations to Private ADMM for Centralized and Federated Learning

    No full text
    International audienceWe study differentially private (DP) machine learning algorithms as instances of noisy fixed-point iterations, in order to derive privacy and utility results from this well-studied framework. We show that this new perspective recovers popular private gradient-based methods like DP-SGD and provides a principled way to design and analyze new private optimization algorithms in a flexible manner. Focusing on the widely-used Alternating Directions Method of Multipliers (ADMM) method, we use our general framework to derive novel private ADMM algorithms for centralized, federated and fully decentralized learning. For these three algorithms, we establish strong privacy guarantees leveraging privacy amplification by iteration and by subsampling. Finally, we provide utility guarantees using a unified analysis that exploits a recent linear convergence result for noisy fixed-point iterations

    From Noisy Fixed-Point Iterations to Private ADMM for Centralized and Federated Learning

    No full text
    International audienceWe study differentially private (DP) machine learning algorithms as instances of noisy fixed-point iterations, in order to derive privacy and utility results from this well-studied framework. We show that this new perspective recovers popular private gradient-based methods like DP-SGD and provides a principled way to design and analyze new private optimization algorithms in a flexible manner. Focusing on the widely-used Alternating Directions Method of Multipliers (ADMM) method, we use our general framework to derive novel private ADMM algorithms for centralized, federated and fully decentralized learning. For these three algorithms, we establish strong privacy guarantees leveraging privacy amplification by iteration and by subsampling. Finally, we provide utility guarantees using a unified analysis that exploits a recent linear convergence result for noisy fixed-point iterations

    Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging

    Full text link
    Decentralized optimization is increasingly popular in machine learning for its scalability and efficiency. Intuitively, it should also provide better privacy guarantees, as nodes only observe the messages sent by their neighbors in the network graph. But formalizing and quantifying this gain is challenging: existing results are typically limited to Local Differential Privacy (LDP) guarantees that overlook the advantages of decentralization. In this work, we introduce pairwise network differential privacy, a relaxation of LDP that captures the fact that the privacy leakage from a node uu to a node vv may depend on their relative position in the graph. We then analyze the combination of local noise injection with (simple or randomized) gossip averaging protocols on fixed and random communication graphs. We also derive a differentially private decentralized optimization algorithm that alternates between local gradient descent steps and gossip averaging. Our results show that our algorithms amplify privacy guarantees as a function of the distance between nodes in the graph, matching the privacy-utility trade-off of the trusted curator, up to factors that explicitly depend on the graph topology. Finally, we illustrate our privacy gains with experiments on synthetic and real-world datasets

    Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging

    No full text
    International audienceDecentralized optimization is increasingly popular in machine learning for its scalability and efficiency. Intuitively, it should also provide better privacy guarantees, as nodes only observe the messages sent by their neighbors in the network graph. But formalizing and quantifying this gain is challenging: existing results are typically limited to Local Differential Privacy (LDP) guarantees that overlook the advantages of decentralization. In this work, we introduce pairwise network differential privacy, a relaxation of LDP that captures the fact that the privacy leakage from a node uu to a node vv may depend on their relative position in the graph. We then analyze the combination of local noise injection with (simple or randomized) gossip averaging protocols on fixed and random communication graphs. We also derive a differentially private decentralized optimization algorithm that alternates between local gradient descent steps and gossip averaging. Our results show that our algorithms amplify privacy guarantees as a function of the distance between nodes in the graph, matching the privacy-utility trade-off of the trusted curator, up to factors that explicitly depend on the graph topology. Finally, we illustrate our privacy gains with experiments on synthetic and real-world datasets

    FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings

    No full text
    International audienceFederated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few (22--5050) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code. As an illustration, we additionally benchmark standard FL algorithms on all datasets. Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research. FLamby is available at~\url{www.github.com/owkin/flamby}

    FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings

    No full text
    International audienceFederated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few (22--5050) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code. As an illustration, we additionally benchmark standard FL algorithms on all datasets. Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research. FLamby is available at~\url{www.github.com/owkin/flamby}
    corecore